Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*

نویسندگان

چکیده

Abstract Natural gradient descent (NGD) helps to accelerate the convergence of dynamics, but it requires approximations in large-scale deep neural networks because its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast practice. Nevertheless, remains unclear from theoretical perspective why and under what conditions such heuristic work well. In this work, we reveal that, specific conditions, achieves same global minima as exact NGD. We consider infinite-width limit, analyze asymptotic training dynamics function space via tangent kernel. space, are identical those information, they quickly. The holds layer-wise approximations; for instance, block diagonal approximation where each corresponds a layer well tri-diagonal K-FAC approximations. also find unit-wise assumptions. All these different an isotropic plays fundamental role achieving properties training. Thus, current study gives novel unified foundation which understand learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks

In this paper, the natural gradient descent method for the multilayer stochastic complex-valued neural networks is considered, and the natural gradient is given for a single stochastic complex-valued neuron as an example. Since the space of the learnable parameters of stochastic complex-valued neural networks is not the Euclidean space but a curved manifold, the complex-valued natural gradient ...

متن کامل

Gradient Descent for Spiking Neural Networks

Much of studies on neural computation are based on network models of static neurons that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes. Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking networks. Here,...

متن کامل

Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms

1 Context Given a finite set of m examples z 1 ,. .. , z m and a strictly convex differen-tiable loss function ℓ(z, θ) defined on a parameter vector θ ∈ R d , we are interested in minimizing the cost function min θ C(θ) = 1 m m i=1 ℓ(z i , θ). One way to perform such a minimization is to use a stochastic gradient algorithm. Starting from some initial value θ[1], iteration t consists in picking ...

متن کامل

Convergence of Gradient Descent Algorithm with Penalty Term for Recurrent Neural Networks

This paper investigates a gradient descent algorithm with penalty for a recurrent neural network. The penalty we considered here is a term proportional to the norm of the weights. Its primary roles in the methods are to control the magnitude of the weights. After proving that all of the weights are automatically bounded during the iteration process, we also present some deterministic convergenc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Statistical Mechanics: Theory and Experiment

سال: 2021

ISSN: ['1742-5468']

DOI: https://doi.org/10.1088/1742-5468/ac3ae3